|
FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity. It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA sequence and its quality data, but has recently become the ''de facto'' standard for storing the output of high-throughput sequencing instruments such as the Illumina Genome Analyzer. ==Format== A FASTQ file normally uses four lines per sequence. * Line 1 begins with a '@' character and is followed by a sequence identifier and an ''optional'' description (like a FASTA title line). * Line 2 is the raw sequence letters. * Line 3 begins with a '+' character and is ''optionally'' followed by the same sequence identifier (and any description) again. * Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. A FASTQ file containing a single sequence might look like this:
The character '!' represents the lowest quality while '~' is the highest. Here are the quality value characters in left-to-right increasing order of quality (ASCII):
The original Sanger FASTQ files also allowed the sequence and quality strings to be wrapped (split over multiple lines), but this is generally discouraged as it can make parsing complicated due to the unfortunate choice of "@" and "+" as markers (these characters can also occur in the quality string). 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「FASTQ format」の詳細全文を読む スポンサード リンク
|